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I.  INTRODUCTION 


Accurate  prediction  of  the  three-dimensional  flow  over  a  projectile  is  essential  for  ob¬ 
taining  aerodynamic  design  parameters  such  as  body  forces  and  moments.  A  bibliography 
and  discussion  of  experimental  and  numerical  work  as  applied  to  projectile  aerodynamics 
has  been  reported  by  Sturek,1  and  Nietubicz  and  Sturek.2  Although  many  design  param¬ 
eters  can  be  calculated  with  acceptable  accuracy  for  simple  geometries,  much  remains  to 
be  done  for  projectile  shapes  which  include  such  geometry  features  as  flattened  nose  tips, 
rotating  bands,  undercuts,  irregular  base  configurations  and  fins. 

This  report  outlines  the  development  of  a  three-dimensional  Navier-Stokes  code  which 
uses  explicit  time-marching  and  zonal  gridding.  The  development  of  this  code  was  initiated 
in  order  to  explore  techniques  for  performing  CFD  computations  utilizing  the  vector  and 
parallel  features  of  advanced  computer  architectures.  The  initial  development  was  carried 
out  for  two-dimensional,  axisymmetric  flow.3 

The  results  achieved,  as  reported  in  Reference  3,  were  sufficiently  encouraging  from 
the  standpoint  of  computational  efficiency  and  accuracy  of  the  flow  field  predictions  for 
highly  complex  flows  to  justify  extension  of  the  code  to  achieve  a  three-dimensional  mod¬ 
eling  capability.  This  code  incorporates  a  fourth  order  dissipation  and  variable,  local, 
time-stepping  within  MacCormackV  predictor-corrector  algorithm  to  accelerate  the  con¬ 
vergence.  A  non-reflecting  outer  boundary  is  implemented  in  order  to  reduce  the  total 
number  of  grid  nodes  required  for  accurate  supersonic  flow  field  computations.  The  code 
is  fully  vectorizable.  Also,  the  code  efficiently  exploits  multiprocessor  architectures  for 
parallel  execution. 

An  area  of  particular  interest  in  projectile  aerodynamics  is  the  nose  tip.5  The  findings 
of  Reference  5  indicate  that  bluntness  of  10%  or  less  has  no  significant  effect  on  the  pitch 
plane  aerodynamics  of  shell  at  supersonic  velocities;  however,  bluntness  of  10%  was  found 
to  have  a  significant  (25%  at  Mach  3)  influence  on  the  Magnus  effect.  Projectile  nose  tips 
tire  generally  blunt  with  a  flattened  (meplate)  front  face.  Typical  nose  bluntness  (ratio  of 
the  meplate  diameter  to  the  maximum  diameter  of  the  shell)  for  artillery  shell  is  10-30%. 
Because  of  the  bluntness,  at  supersonic  speeds,  the  flow  field  contains  a  detached  shock 
wave  which  results  in  a  region  of  subsonic  flow  at  the  nose  of  the  projectile.  An  experimental 
study  on  the  influence  of  nose  bluntness  for  a  projectile  was  reported  by  Dolling  and  Gray.6  7 
These  data  provide  a  challenging  test  case  for  computational  aerodynamics  since  effects 
of  the  blunt  nose  are  observed  in  the  measured  velocity  profiles  at  downstream  positions 
located  between  three  and  six  calibers  from  the  nose. 

This  report  describes  the  evaluation  of  the  capability  of  the  new  code  to  achieve 
accurate  results  and  examines  the  degree  of  difficulty  required  to  perform  the  computations. 
The  computational  performance  achieved  through  the  efficient  utilization  of  the  vector  and 
parallel  architecture  on  the  Cray  2  and  the  Cray  Y-MP  supercomputers  is  defined. 
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II.  GOVERNING  EQUATIONS 


The  compressible,  Reynolds-averaged,  Navier-Stokes  equations  in  a  3-D  generalized 
coordinate  system  are  written  in  the  following  strong  conservation  form.  The  dependent 
variables  p,u,v,w  and  e  are  mass  averaged,  where  e  is  specific  total  energy,  T  temperature. 
p  mean  density,  p  pressure,  and  t  time. 

OQ  d(E-S)  d(F-T)  d(G-R) 

dt  dt  dp  dC,  [  1 

The  generalized  coordinates  are: 

£  =  £(x,y,.z)  -  longitudinal  coordinate 
77  =  r)(x,  y,  z)  -  near  normal  coordinate 
£  =  £(x,y,z)  -  circumferential  coordinate 
t  =  t  -  time  . 

The  flux  vectors  are  defined  as: 
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Tn  ~  ~  2  (P  +  e)  ( ux  +  vy  +  wz)  4-  2  (p  4-  e)  ux 


Txy  —  Tyx  —  {p  4  t)  (Uy  -j-  Vx) 
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Txz  =  Tzx  =  (fi+e)  (u,  +  wx) 

2 

rvv  =  -  -  (^  +  £)  (ux  +  vy  +  wz)  +  2(fi  +  e)vy 
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Tyz  =Tzy  =  (n  +  e)(v2  +  Wy) 

2 
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The  molecular  viscosity  is  /x  and  the  turbulent  viscosity  is  e.  The  Jacobian  of  the  general¬ 
ized  coordinates  is: 

J~l  =  +  X^ytZr,  -  X^Zr,  -  X^Z^  -  X^Z^. 

The  velocities  in  the  rj  and  £  coordinates  are: 

U  =  fxu  +  +  &w 

V  =  rjxu  +  rjyV  +  t)zw  (2) 

w  =  £xli  +  CyV  +  CzW 

which  represent  the  contravariant  velocity  components. 

The  air  was  assumed  to  be  a  perfect  gas,  satisfying  the  equation  of  state 

p  =  pRT 

3  ' 


(3) 


where  R  is  the  gas  constant  (1716  ft2/sec2  -  °R  for  air).  For  the  dependence  of  laminar 
viscosity  on  the  temperature,  Sutherland’s  law  was  used: 


P 


2.270 


J3/2 

T+  198.6 


x  10"8 


lb  —  sec 
ft 2 


(4) 


The  laminar  and  turbulent  Prandtl  numbers  Pr  and  Pr,,  were  assumed  constant  with 
values  of  0.72  and  0.9,  respectively.  The  ratio  of  specific  heats  was  also  assumed  constant 
and  equal  to  1.4.  The  specific  heat  capacities  at  constant  volume  and  constant  pressure 
tire  Cv  and  Cp,  respectively. 


(Cv  =  4290  ft2 /sec2  -  °R  and  Cp  =  6006  ft2/sec2  -  °R  for  air)  . 


The  total  energy  per  unit  mass,  e,  is  given  by 


e  —  CVT  -(-  0.5(u2  +  u2  -j-  u>2). 

The  local  pressure  is  determined  using  the  following  relation 


P  =  p(l  —  1)  e  —  0.5(u2  4-  u2  +  tu2)] 


(5) 


(6) 


III.  NUMERICAL  METHOD 

1.  COMPUTATIONAL  ALGORITHM 

MacCormack’s4  explicit  and  unsplit  method  is  utilized  for  numerical  integration  of  the 
governing  Equations  (1)  in  time  from  an  assumed  initial  condition  until  a  steady  solution 
is  obtained.  The  finite-difference  method  for  the  one-dimensional  equation: 


di  d( 


is  given  by  the  following  predictor  -  corrector  steps: 


=  Qh  -  T7  (£%  - 


(7) 


(8) 


4 


(9) 


Qh  +  Q 


n+1 


At 


where  E^f1  implies  that  the  terms  are  evaluated  using  Q”*1  and  so  forth.  After  com¬ 
pletion  of  the  above  described  two  steps,  first  derivatives  of  the  governing  equations  are 
approximated  by  second-order  accurate  central  differences. 

The  reason  for  using  the  unsplit  method  over  the  time-split  method  is  to  save  the 
number  of  accessions  of  the  memory.  In  other  words,  for  advancing  one  time-step,  the 
unsplit  method  requires  considerably  fewer  accessions  to  the  memory  than  the  time-split 
method.  For  the  explicit  method  the  time-step  size  must  not  exceed  the  maximum  al¬ 
lowed  by  the  CFL  condition.  An  approximate  linearized  stability  analysis  for  the  inviscid 
equations  yields  the  following: 


At  = 


A£  A  77  AC 


+  c 


1  ( 4.  1  Cy 

VAC  At?  +  AC 


(  ^ 2  4-  -A-  k 
\AC  At?  +  AC 


(10) 


where  c  is  the  speed  of  sound.  Since  the  terms  involving  molecular  and  eddy  viscosity 
stabilize  the  solution,  the  time-step  size  computed  using  the  inviscid  analysis  was  found 
stable  for  both  inviscid  and  viscous  applications.  Equation  (10)  was  multiplied  by  a  factor 
(denoted  by  CFL)  that  is  slightly  less  than  one. 

For  a  truly  time  dependent  solution,  At  obtained  from  Equation  (10)  must  be  the 
minimum  value  of  the  At  for  all  field  cells.  Because  of  the  non-uniformity  of  the  grid 
spacing  for  the  viscous  flow  calculations,  the  values  of  At  can  vary  substantially  within 
the  field.  Hence,  the  time  dependent  solution  can  converge  very  slowly  if  the  convergence 
is  constrained  by  the  most  restrictive  condition  within  the  flow  field.  However,  when  the 
steady  state  solution  is  of  interest,  a  substantial  improvement  in  the  convergence  rate 
can  be  obtained  by  using  locally  variable  time  steps.  In  other  words,  instead  of  using  a 
minimum  At  of  all  field  cells,  each  cell  uses  its  local  maximum  allowable  At. 


2.  NUMERICAL  DAMPING 

Flows  containing  strong  shock  waves  cause  numerical  oscillations.  In  order  to  con¬ 
trol  these  oscillations,  two  artificial  dissipation  terms  were  incorporated  into  the  present 
numerical  procedure.  The  resulting  artificial  dissipation  term  is  a  blending  of  second  and 
fourth  order  differences.  That  is 
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(11) 


D  =  {D\  +  Dl  +  Dl-D\-D*~  D'()  . 

The  second  and  fourth  order  differences  for  the  dissipation  axe  obtained  using  the  following 
equations. 


D 


2 


at  ^  ae 


(12) 


&Q 

(i  +  v)\  ae 


(13) 


where 


Af  =  |C7|  +  (^  +  {J  +  £,J)ic 


Note  that  Equation  (12)  is  the  damping  suggested  by  MacCormack,4  which  is  only  of 
significant  m?  nitude  in  regions  of  pressure  oscillations.  This  local  effect  prevents  unreal¬ 
istic  oscillations  in  the  neighborhood  of  shockwaves. 

It  is  well  known  that  central-difference  schemes  experience  odd  and  even  point  de¬ 
coupling  for  both  linear  and  non-linear  problems.  These  high  frequency  modes  must  be 
damped  to  enhance  the  convergence  rate.  The  new  terms,  introduced  by  Equation  (13) 
into  the  MacCormack  algorithm,  provide  high  frequency  damping.  The  improvement  in 
the  convergence  rate  that  can  be  obtained  for  a  class  of  problems  will  be  reported  in  a 
separate  follow-up  report. 

The  artificial  dissipation  terms  were  conveniently  handled  by  adding  them  to  appro¬ 
priate  flux  vectors  E,  F  and  G  of  Equation  (1). 

3.  ZONAL  GRIDDING 

The  problem  of  computing  the  external  flow  field  over  a  blunt  nosed  body  is  complex 
because  of  the  presence  of  sharp  corners.  A  conventional  wrap-around  grid  requires  round¬ 
ing  of  these  sharp  corners  and  can  result  in  rapid  variation  of  the  metric  terms.  However, 
this  complex  geometry  can  be  efficiently  gridded  using  the  zonal  grid  approach.  The  com¬ 
plex  nose  geometry  is  broken  into  two  zones  of  simple  geometric  shape.  In  each  zone  an 
algebraic  grid  generation  technique  is  used  with  grid  clustering  near  the  surface. 

A  simple  zone  coupling  technique  has  been  used  in  the  present  work.  In  this  technique, 
zonal  grids  share  one  grid  cell  boundary  with  geometric  continuity  of  at  least  one  grid  cell 
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for  overlapped  zones.  The  coupling  of  zones  is  obtained  by  using  one  grid-cell  overlap.  This 
zonal- coupling  is  simple  and  transparent  to  shockwaves  and  separated  boundary  layers. 
The  transparency  of  zonal-coupling  is  important  because  the  initial  conditions  are  very 
far  from  the  steady  state  and,  during  the  transient  phase,  shocks  may  travel  through  the 
overlapped  boundaries.  As  shown  in  Figure  1,  the  zones  coincide  on  a  row  of  overlapped 
cells.  The  right-hand  side  boundary  of  zone  A  is  contained  within  zone  B  and  the  left- 
hand  side  boundary  of  zone  B  is  contained  within  zone  A.  Since  overlapped  cells  are  of  the 
same  shape  in  both  zones,  this  approach  requires  transfer  of  information  from  the  field  of 
zone  A  to  the  boundary  of  zone  B  and  vice-versa.  This  technique  does  not  introduce  any 
zonal  boundary  condition.  In  other  words,  the  continuity  in  the  solution  across  zones  was 
obtained  by  simply  exchanging  data. 

Each  multi-zone  solution  was  obtained  by  taking  one  time  step  in  each  zone  and 
then  exchanging  boundary  information  between  zones.  This  zone-coupling  technique  has 
worked  well  for  both  two-dimensional  and  three-dimensional  applications. 


4.  BOUNDARY  CONDITIONS  AND  TURBULENCE  MODEL 


The  free-stream  boundary  conditions  axe  held  at  the  appropriate  freestream  values 
for  the  duration  of  the  solution  procedure.  At  the  downstream  boundary,  the  conventional 
zero  gradient  boundary  condition  is  applied.  The  no  slip  boundary  condition  for  viscous 
flow  is  enforced  by  setting  the  contravarient  velocities  to  zero  on  the  body  surface.  At  the 
outer  boundary,  a  no- reflection  boundary  condition  is  applied  (Appendix  A).  This  feature 
of  the  code  enables  solutions  for  supersonic  flow  to  be  achieved  using  a  minimum  of  flow 
field  grid  nodes  since  the  outer  flow  field  can  be  significantly  truncated.  This  feature  is 
especially  important  for  long  length/diameter  bodies. 

For  nonspinning  projectiles  at  an  angle  of  attack,  symmetry  exists  about  the  projectile 
axis;  therefore,  the  computation  is  performed  only  over  the  half  body.  Thus,  symmetry 
boundary  conditions  are  imposed  at  both  ends  in  the  circumferential  direction.  The  pres¬ 
sure  on  the  body  surface  is  obtained  by  applying  a  normal  pressure  boundary  condition 
using  the  momentum  equations.  For  the  body  at  tj  =  const.,  the  normal  pressure  boundary 
condition  is: 


(*7x  d"  T]y  -)-  P  z)  Pv  {(Cx^x  T  Cy*?y  ~b  C zPz)  P(  d“  (£x*?x  d*  £yPy  d"  £zPz)  Pt 
+pu  (rjxU£  +  pzw^  +  rjyv <;)  +  pW  (7/ruc  +  pzW(_  -)-  7„vc)}  . 


(14) 


For  viscous  flow,  U  =  W  =  0  is  used  in  the  above  equation. 

The  temperature  on  the  body  surface  is  computed  using  the  adiabatic  wall  condition. 
The  initial  condhion  is  prescribed  using  the  freestream  condition. 

For  the  computation  of  turbulent  flows,  a  turbulence  model  must  be  supplied.  In  the 
present  calculations,  a  two-layer  algebraic  eddy  viscosity  model  due  to  Baldwin  and  Lomax8 
is  used.  Following  shadowgraph  observations,  transition  to  turbulence  from  laminar  to 
turbulent  viscous  flow  has  been  initiated  on  the  ogive  at  0.5  calibers  downstream  of  the  nose 
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tip.  The  transition  from  laminar  to  fully  turbulent  viscous  effects  was  imposed  gradually 
through  an  axial  distance  of  0.1  caliber. 


IV.  MODEL  GEOMETRY  AND  EXPERIMENT 


One  means  of  validating  a  computer  code  for  a  particular  class  of  problem  is  through 
comparison  with  available  experimental  data.  The  model  used  for  the  experimental  and 
computational  study  presented  here  is  an  idealization  of  a  realistic  projectile  nose  geometry. 
A  sketch  of  the  model  is  shown  in  Figure  2.  The  model  consists  of  a  flat-faced  nose,  a 
three-caliber  ogive,  and  a  six-caliber  cylindrical  section.  The  experimental  data6,7  used 
for  comparison  in  this  report  were  obtained  in  the  Princeton  University  supersonic,  high 
Reynolds  number,  blow-down  tunnel.  At  the  test  section  the  value  of  was  2.95.  The 
supply  header  conditions  were  Pq  =  14400  lbf/ft2,  Tt  =  468°R,  giving  a  nominal  test  section 
Re  =  1.9  xlO7  per  foot.  The  pressure  and  velocity  profile  data  were  obtained  at  0°  and 
2.9°  Jingle  of  attack  at  the  wind  and  lee-sides. 


V.  COMPUTATIONAL  RESULTS 

All  computations  were  performed  on  the  CRAY-2  supercomputer  at  BRL.  As  men¬ 
tioned  earlier,  the  solution  technique  involves  solving  the  3-D,  time  dependent,  full  Navier- 
Stokes  equations.  The  procedure  is  started  by  assuming  uniform  free-stream  conditions  for 
all  field  points.  A  slow  start  of  the  boundary  condition  is  implemented  and  the  calculation 
marches  in  time  until  a  steady  state  solution  is  obtained.  A  criterion  for  convergence  rate 
is  established  by  the  magnitude  of  the  variation  of  the  root  mean  square  of  the  residual  of 
the  continuity  equation  with  iteration.  Also,  the  surface  pressure  distribution  is  checked 
for  time  invariance.  For  this  application,  2000-3000  iterations  using  the  local,  variable, 
time  step  technique  were  found  to  provide  a  converged  solution. 

A  two  zone  grid  for  the  present  calculations  was  algebraically  generated.  Figure  3 
shows  the  surface  grid  with  the  lee  and  wind-side  planes.  The  grid  size  for  zone  1  and 
zone  2  were  (30x90x39)  and  (161x50x39),  respectively.  Thirty-nine  circumferential  planes 
were  placed  uniformly  giving  five  degrees  of  spacing  between  adjacent  planes.  In  the  radial 
direction,  grid  points  were  clustered  using  a  hyperbolic  tangent  function.  The  grid  spacing 
in  the  radial  direction  at  the  body  surface  was  2.564  x  10~4  calibers.  The  outer  boundary 
in  the  radial  direction  was  placed  at  0.5  caliber  from  the  body  surface.  An  expanded  view 
of  the  grid  in  the  nose  region  is  shown  in  Figure  4. 

The  flow  over  the  flat  nose  tip  consists  of  an  over-expansion  around  the  sharp  corner 
followed  by  a  recompression.  The  recompression  is  strong  enough  to  produce  an  oblique 
shock  wave.  The  flow  separates  in  the  streamwise  direction  in  the  vicinity  of  the  corner. 
Figure  5  is  a  wind  tunnel  shadowgraph  at  a  =  0°.  Computed  Mach  number  contours  for 
a  =  0°  are  shown  in  Figure  6.  These  compare  well  with  qualitative  features  of  the  flow 
field  shown  in  the  shadowgraph.  Note  that  the  detached  bow  shock  and  the  recompression 
shock  immediately  downstream  of  the  leading  edge  corner  were  successfully  captured  in 
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the  computations. 

Figure  7  shows  the  computed  and  measured  surface  pressure  distribution  for  a  =  0°. 
The  arc  distance  on  the  x  axis  is  measured  from  the  axis  of  symmetry  point  of  the  flat¬ 
faced  nose  tip.  Good  agreement  between  computation  and  experimental  data  is  achieved, 
including  the  region  of  over-expansion.  Figure  8  shows  computed  and  measured  velocity 
profiles  at  three  different  stations.  Stations  1,  2  and  3  are  located  at  X/D  =  2.86,  4.65  and 
5.93  from  the  flat  nose  tip,  respectively.  The  agreement  achieved  between  computation  and 
experiment  is  considered  to  be  highly  encouraging  but  not  fully  satisfactory.  The  largest 
discrepancy  between  computation  and  experiment  is  about  three  percent  of  the  free  stream 
velocity. 

Next,  the  a  =  2.9°  case  was  considered.  A  shadowgraph  for  this  flow  condition 
is  shown  in  Figure  9.  As  expected,  the  wind-side  recompression  shock  moves  upstream 
and  increases  in  strength  while  the  opposite  occurs  on  the  lee-side.  The  computed  Mach 
contours  for  this  case  are  shown  in  Figure  10.  The  experimentally  observed  shift  in  shock 
pattern  is  predicted.  Also,  the  computed  and  experimented  surface  pressure  distributions 
agree  well  (Figure  11)  for  both  the  wind  and  lee-sides. 

Figure  12  shows  computed  and  measured  velocity  profiles  for  the  lee-side  at  stations 
1,  2  and  3.  Figure  13  shows  comparisons  between  computed  and  measured  velocity  profiles 
for  the  wind-side.  Agreement  between  computation  and  experiment  is,  again,  considered 
to  be  highly  encouraging  but  not  fully  satisfactory.  The  largest  discrepancy  was  about 
seven  percent  of  the  free  stream  velocity  on  the  wind-side  and  about  five  percent  of  the 
free  stream  velocity  on  the  lee-side. 

The  results  at  the  down-stream  stations  are  strongly  affected  by  the  accuracy  of  the 
modeling  in  the  vicinity  of  the  bow  shock  which,  for  this  flat  nosed  case,  contained  local  flow 
separation  at  the  leading  edge  corner  of  the  ogive.  A  consistent  feature  of  the  disagreement 
between  computational  and  experimental  velocity  profiles  is  in  the  lower  portion  of  the 
viscous  layer  at  all  measurement  stations.  This  could  be  the  effect  of  a  deficiency  of  the 
turbulence  model  which  does  not  account  for  flow  field  history.  The  flow  field  history  effect 
resulting  from  the  extreme  expansion  around  the  flattened  nose  and  local  flow  separation 
could  be  significant.  Another  possibility  is  that  increased  grid  resolution  is  needed  near  the 
surface.  These  possibilities  will  be  addressed  in  future  studies.  Another  consideration  to 
be  kept  in  mind  is  that  of  probe-wall  interference  which  limits  the  accuracy  of  pitot-probe 
measurements  in  the  vicinity  of  a  wall. 

The  single  processor  CPU  time  for  a  laminax  solution  on  a  grid  consisting  of  419250 
grid  points  was  about  10  sec/iteration.  The  CPU  time  for  a  solution  including  turbulent 
viscous  effects  was  about  11  sec/iteration  for  a  solution  converging  for  4000  time  steps 
in  which  the  turbulence  quantities  were  recomputed  every  10  time  steps.  Values  for  the 
smoothing  parameters  were  determined  by  performing  trial  runs  using  the  axisymmetric 
version  of  the  code.  The  three-dimensional  computations  were  found  to  proceed  without 
any  special  user  interaction  by  using  the  predetermined  values  of  the  smoothing  parameters. 
Of  particular  significance  for  this  test  case  is  that  the  flattened  nose  tip  did  not  impose 
flow  field  convergence  problems. 
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VI.  PARALLEL  IMPLEMENTATION  AND 

PERFORMANCE 


The  mapping  of  complex  physical  zones  onto  processors  is  an  important  issue  for 
efficient  implementation  on  parallel  computers.  The  partitioning  across  processors  for 
3D  zonal  applications  can  be  done  in  the  coarse  to  medium  granularity  range  on  shared 
memory  processors.  In  the  coarse  granularity  approach,  a  zone  or  subregion  of  a  zone 
would  be  mapped  on  each  processor.  In  general,  the  approach  chosen  will  be  application 
dependent.  When  a  number  of  zones  and/or  size  of  zones  change  from  one  application  to 
another,  a  new  mapping  procedure  may  be  required.  Also,  it  will  be  difficult  to  evenly 
distribute  work  among  the  processors  when  zone  sizes  vary  by  considerable  magnitude. 
This  would  result  in  inefficient  load  balancing  among  the  processors.  In  other  words,  some 
processors  will  finish  their  work  faster  than  others  and  will  be  idle. 

In  the  medium  granularity  approach,  all  processors  work  on  a  single  zone  in  parallel. 
This  parallelization  consists  of  decomposing  a  zone  at  DO-loop  level.  For  a  3D  zone, 
each  processor  can  be  assigned  to  work  on  a  2D  plane.  Since  no  artificial  boundaries  are 
introduced  on  shared  memory  machines,  this  approach  generally  results  in  better  load 
balancing  than  the  coarse  granularity  approach.  On  Cray  multiprocessors,  this  can  be 
implemented  using  microtasking  directives  such  as  DO  GLOBAL.  On  Crays,  an  inner  DO- 
loop  of  a  2D  plane  can  be  executed  in  vector  mode.  Also,  this  approach  is  a  natural  way 
to  partition  a  3D  zone  because  it  maintains  the  physical  zone  intact. 

There  are  two  related  issues  to  be  addressed  for  efficient  implementation  of  a  solu¬ 
tion  algorithm  on  shared  memory  multiprocessors:  (1)  synchronization  overhead;  and  (2) 
convergence  rate.  The  synchronization  overhead  arises  when  processors  have  data  depen¬ 
dencies  which  cause  them  to  be  idle.  This  can  happen  when  global  information  is  required 
to  continue  the  solution  process.  For  the  present  algorithm,  this  situation  arises  when 
one  uses  a  global  minimum  time  step  size  for  time  dependent  calculations.  Generally,  the 
convergence  rate  of  a  parallel  algorithm  is  related  to  synchronization  overhead.  In  order 
to  keep  synchronization  overhead  to  a  reasonable  magnitude,  many  parallel  algorithms 
utilize  readily  available  information  instead  of  the  best  possible  information.  An  example 
of  this  is  that  Jacobi  and  multi-color  iterative  methods  are  preferred  over  the  Gauss-Seidel 
iterative  method  for  parallel  implementation.  In  the  present  implementation  substantial 
synchronization  overhead  can  arise  when  using  a  global  minimum  time  step  size  for  time 
integration.  However,  when  only  steady  state  solutions  are  of  interest,  the  convergence 
rate  can  be  improved  by  using  a  local  maximum  time  step  size  (Section  III).  Fortunately, 
the  local  time  step  size  procedure  is  highly  parallel.  The  use  of  a  local  time  step  size  not 
only  eliminates  the  synchronization  overhead  associated  with  time  step  calculations,  but 
improves  the  rate  of  convergence  of  the  solution  algorithm. 

The  parallel  implementation  of  each  time  step  integration  can  be  summarized  as 
follows.  For  the  explicit  method  (Section  III)  with  local  time  step  size,  the  integration 
algorithm  reduces  to  the  classical  fork  and  join  procedure  of  parallel  implementations.  We 
must  have  all  information  on  the  RHS  of  Equation  (8)  available  before  we  can  fork  or  start 
solving  Equation  (8)  in  parallel.  We  must  finish  solving  Equation  (8)  for  every  interior 
grid  point  for  all  zones  (join)  before  we  can  fork  and  start  forming  the  . actors  fin  the 
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RHS  of  Equation  (9).  The  fork  and  join  procedure  is  used  not  only  for  solving  Equations 
(8  and  9),  but  also  for  computing  flux  vectors,  damping  terms,  time  step  size  and  so  forth. 
Thus,  the  overall  solution  algorithm  is  highly  parallel. 

To  evaluate  the  performance  of  the  algorithm  for  an  application  that  requires  sub¬ 
stantial  memory,  the  complex  3D  flow  for  a  flat-nosed  projectile  discussed  previously  has 
been  considered.  The  cost-effectiveness  or  the  efficiency  of  the  parallel  implementation  is 
measured  in  terms  of  speed-up  or  wall-clock  execution  time.  The  speed-up  is  defined  as 
the  ratio  of  the  single  processor  execution  time  of  a  given  algorithm  to  that  of  multipro¬ 
cessor  execution  time  of  the  same  algorithm.  Figure  14  shows  the  speed-up  of  the  parallel 
implementation  on  Cray-2.  The  speed-up  of  3.8  was  achieved  for  this  application  on  four 
processors  of  the  Cray-2.  This  translates  into  98.5%  parallel  efficiency.  Figure  15  shows 
the  speed-up  achieved  on  eight  processors  of  the  Cray  Y-MP.  The  code  achieved  a  speed-up 
of  6.8  on  eight  processors.  Again,  this  indicates  a  parallel  efficiency  of  97%  on  the  Cray 
Y-MP.  The  speed-up  on  the  Cray-2  and  Cray  Y-MP  supercomputers  were  obtained  by 
explicitly  incorporating  microtasking  directives,  not  by  using  an  autotasking  preprocessor. 
The  maximum  efficiency  that  could  be  achieved  for  this  application  may  be  higher  than 
reported  here  due  to  the  fact  that  several  computations  such  as  boundary  conditions,  zonal 
overlap  exchange,  etc.,  were  not  executed  in  parallel. 


VII.  CONCLUDING  REMARKS 

This  report  has  described  the  development  of  a  three  dimensional,  unsteady,  zonal, 
Navier-Stokes  code  which  has  been  structured  to  efficiently  utilize  the  vector /parallel  ar¬ 
chitecture  of  modern  supercomputers.  The  code  has  been  applied  to  predict  the  flow 
field  around  a  blunt-nosed  projectile  at  supersonic  speeds.  Computational  predictions 
were  compared  to  experimental  data  for  surface  pressure  distributions  and  velocity  pro¬ 
files.  The  results  achieved  indicate  that  the  zonal  topology  provides  accurate  and  efficient 
computational  performance  for  the  flattened  (meplate)  nose  shape. 

Although  theoretical  analysis  and  computations  for  ideal  problems  can  indicate  some 
general  trends,  the  achievement  of  efficiency  for  computational  execution  is  best  demon¬ 
strated  for  a  realistic  3D  application.  This  study  has  shown  the  potential  to  reduce  wall- 
clock  time  by  tasking  idle  processors.  For  many  similar  3D  zonal  applications  on  present 
supercomputers  this  can  result  in  substantially  enhanced  turn-around  times. 

Future  development  of  this  code  will  be  to  extend  the  capabilities  to  perform  com¬ 
putations  for  highly  complex  configurations  such  as  guided,  non-axisymmetric  and  finned 
body  shapes. 
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Figure  1.  Schematic  illustration  of  single  grid-cell  overlap. 
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Figure  2.  Schematic  of  the  model. 


Figure  3  Body  surface  and  pitch  plane  computational  grid. 


Figure  4.  Pitch  plane  computational  grid,  nose  region. 
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Wind  tunnel  spark  shadowgraph  of  How  over  blunt  model,  M  =  2.95,  « 


Figure  6.  Computational  results,  Mach  contour  in  nose  region,  a  =  0.0°. 
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Figure  7.  Surface  pressure  distribution,  computation  compared  to  experiment,  a  =  0.0°. 
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Figure  8.  u-velocity  profile,  computation  compared  to  experiment, 
a  =  0.0°,  XI D  =  2.86,  4.65  and  5.93. 


Figure  9.  Wind  tunnel  spark  shadowgraph  of  flow  over  blunt  model,  M  =  2.95,  o  =  2.9°. 


Figure  10.  Computational  results,  Mach  contour  in  nose  region,  a  =  2.9°. 
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Figure  11.  Surface  pressure  distribution,  computation  compared  to  experiment, 
a  =  2.9°,  wind  and  lee-side. 
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Figure  13.  u-velocity  profile,  computation  compared  to  experiment, 
a  =  2.9°,  X/D  =  2.86,  4.65  and  5.93,  wind-side. 
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Figure  14.  Speedup  on  Cray-2  -  3D  microtasked  code. 
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igure  15.  Speedup  on  Cray  Y-MP  3D  zonal  code. 
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APPENDIX  A:  NON-REFLECTING  OUTER  BOUNDARY  CONDITION 


This  appendix  describes  the  derivation  of  the  non-reflection  boundary  condition  em¬ 
ployed  along  the  top  boundary.  This  boundary  condition  is  consistent  when  the  following 
conditions  are  met. 

(1)  The  local  Mach  number  is  greater  than  one  at  the  boundary. 

(2)  There  is  only  one  outwards  running  Mach  line.  This  requirement  is  met  if  the 
local  normal  Mach  number  at  the  boundary  is  less  than  one. 

Consider  a  nonorthongonal  coordinate  system  as  shown  in  Figure  Al.  Through  point 
x,y  one  can  draw  the  velocity  vector  and  the  Mach  wave  inclined  at  an  angle  y.  with 
respect  to  the  velocity  vector.  The  velocity  vector  is  at  an  angle  u>  measured  with  respect 
to  Cartesian  coordinate  x.  This  can  be  indicated  as: 


and 


which  leads  to  a  toted  included  angle 

0  =  fl  +  LJ. 

Let 

/  =  f(p,u,v,T)T. 

The  non-reflection  condition  requires  that  the  flow  variables  remain  constant  along 
the  left  running  Mach  lines. 


fi,JL  —  fx,y 


which  gives 


where 


Jx,y  —  fk-l,JL-\  4 


_ A _ 

B{fk,JL- 1  —  fk-\,JL-\) 


-  xk-ui-i)2  +  (y  -  yk-i,jL-\ 


B  =  (Ax2  4-  Ay2)  2 


Ax  =  Xk,JL~\  ~  Xk-l,JL-\ 
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and 


Ay  =  ykjL- 1  -  yk—\,jL—\ 


*  = tan''  (ff) 

(Vk,JL- 1  ~  Vi,JL  +  xul  tan  6  —  xk,jL-\  tan  $) 
(tan  6  —  tan  $) 


y  =  Vi,JL  +  tan#  (x  -  xi<JL)  ■ 
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Picatinny  Arsenal,  NJ  07806-5000 

1  Director 

Benet  Weapons  Laboratory 
Armament  RD&E  Center 
US  Army  AMCCOM 
ATTN:  SMCAR-LCB-TL 
Watervliet,  NY  12189-4050 

1  Commander 

US  Army  Armament,  Munitions 
and  Chemical  Command 
ATTN:  SMCAR-ESP-L 
Rock  Island,  IL  61299-5000 

1  Commander 

US  Army  Aviation  Systems  Command 
ATTN:  AMSAV-DACL 
4300  Goodfellow  Blvd. 
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ATTN:  Dr.  W.L.  Oberkampf 
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1  Massachusetts  Institute  of 

Technology 
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77  Massachusetts  Avenue 
Cambridge,  MA  02139 
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NASA  Ames  Research  Center 
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ATTN:  Wilford  E.  Martwick 
Ken  Sundeen 
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ATTN:  Dr.  Michael  J.  Hemsch 
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1  Applied  Technology  Associates 

ATTN:  Mr.  R.J.  Cavalleri 
P.O.  Box  19434 
Orlando,  FL  32814 

1  United  States  Military  Academy 

Department  of  Mechanics 
ATTN:  LTC  Andrew  L.  Dull 
West  Point,  NY  10996 
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